A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management

نویسندگان

  • Shaofeng Liu
  • Chris A. McMahon
  • Steve J. Culley
چکیده

Information retrieval (IR) is a well-established research and development area. Document formats such as SGML (Standard Generalised Markup Language) and XML (eXtensible Mark-up Language) have become widely used in recent years. Traditional IR systems demonstrate limitations when dealing with such documents, which motivated the emergence of structured document retrieval (SDR) technology intending to overcome these limitations. This paper reviews the work carried out from the inception to the development and application of SDR in engineering document management. The key issues of SDR are discussed and the state of the art of SDR to improve information access performance has been surveyed. A comparison of selected papers is provided and possible future research directions identified. The paper concludes with the expectation that SDR will make a positive impact on the process of engineering document management from document construction to its delivery in the future, and undoubtedly provide better information retrieval performance in terms of both precision and functionality. # 2007 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval

Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model.   Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

AT&T at TREC-7 SDR Track

AT&T participated in the Spoken Document Retrieval (SDR) track of TREC-7. Our speech retrieval system uses modern Information Retrieval (IR) methods in conjunction with in-house automatic speech recognition. The novel feature of our TREC-7 work is the use of document expansion to reduce the performance loss due to ASR errors. Results show that retrieval from automatic transcriptions of speech i...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computers in Industry

دوره 59  شماره 

صفحات  -

تاریخ انتشار 2008